feat(rag): add VoyageAI voyage-context-4 contextualized embedding support by fzowl · Pull Request #6368 · crewAIInc/crewAI

fzowl · 2026-06-27T13:31:25Z

Add support for VoyageAI models: voyage-context-4

feat(rag): add VoyageAI voyage-context-4 contextualized embedding support
Fix JSON crew version pin (Fix JSON crew version pin #6342)

Summary by CodeRabbit

New Features
- Added support for VoyageAI contextualized embedding models, routing requests to contextualized embedding when the selected model indicates contextual behavior.
- Introduced a contextual chunk-size setting to better handle large inputs.
Tests
- Added coverage to verify correct routing between standard and contextualized embedding calls.
- Added assertions for contextualized input wrapping (list-of-lists) and output flattening, plus validation of contextualization-specific parameters.

…port Route voyage-context-* models through the contextualized embeddings endpoint (client.contextualized_embed) with chunk_size set to the 32000 maximum. Input is passed as a flat list of strings and the per-document chunk embeddings are flattened into the returned vectors.

coderabbitai · 2026-06-27T13:33:27Z

📝 Walkthrough

Walkthrough

Adds support for VoyageAI voyage-context* models in VoyageAIEmbeddingFunction, with a new chunk-size constant and a branch that uses contextualized_embed for contextual inputs. Tests cover standard routing, contextualized routing, flattened output, and input wrapping.

Changes

VoyageAI contextualized embedding

Layer / File(s)	Summary
Contextualized embed routing and flattening `lib/crewai/src/crewai/rag/embeddings/providers/voyageai/embedding_callable.py`	Adds `CONTEXTUALIZED_CHUNK_SIZE = 32000` and routes `voyage-context*` models to `contextualized_embed`; other models continue through `embed` via a local `model` variable.
Embedding routing and input-shape tests `lib/crewai/tests/rag/embeddings/test_voyageai_embedding_callable.py`	Adds tests for standard-model `embed` calls, contextualized-model routing, flattened contextualized results, wrapped contextualized inputs, and single-string normalization.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly matches the main change: adding VoyageAI voyage-context-4 contextualized embedding support.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

lib/crewai/tests/rag/embeddings/test_voyageai_embedding_callable.py (1)

30-49: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Cover the single-document multi-chunk shape directly.

This test only proves concatenation across result.results. The new behavior that matters here is flattening result.results[*].embeddings when one input auto-chunks into multiple vectors, so a fixture with a single result containing two embeddings would make the regression target much clearer.

Suggested test tweak

     mock_client.contextualized_embed.return_value = MagicMock(
         results=[
-            MagicMock(embeddings=[[0.1, 0.2]]),
-            MagicMock(embeddings=[[0.3, 0.4]]),
+            MagicMock(embeddings=[[0.1, 0.2], [0.3, 0.4]]),
         ]
     )
@@
-    result = fn(["aa", "bb"])
+    result = fn(["aa"])
@@
     mock_client.embed.assert_not_called()
     mock_client.contextualized_embed.assert_called_once()
     assert np.allclose(result, [[0.1, 0.2], [0.3, 0.4]])

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/tests/rag/embeddings/test_voyageai_embedding_callable.py` around
lines 30 - 49, Update the contextualized embedding test in
VoyageAIEmbeddingFunction so it directly covers the single-document multi-chunk
case: keep the existing VoyageAIEmbeddingFunction and contextualized_embed
setup, but make the mocked contextualized_embed response return one result whose
embeddings contains multiple vectors, then assert the callable flattens
result.results[*].embeddings into the final output. This should replace the
current multi-result concatenation check while still verifying embed is not
called and contextualized_embed is used.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@lib/crewai/src/crewai/rag/embeddings/providers/voyageai/embedding_callable.py`:
- Around line 58-66: The voyage-context path in embedding_callable’s callable
currently hardcodes input_type to document, ignoring any configured value.
Update the contextualized_embed call to use the configured input_type from
self._config when appropriate, or add explicit validation in the embedding
callable to reject non-document configurations for voyage-context models. Make
the fix in the branch that handles model.startswith("voyage-context") so the
behavior matches the selected config.

---

Nitpick comments:
In `@lib/crewai/tests/rag/embeddings/test_voyageai_embedding_callable.py`:
- Around line 30-49: Update the contextualized embedding test in
VoyageAIEmbeddingFunction so it directly covers the single-document multi-chunk
case: keep the existing VoyageAIEmbeddingFunction and contextualized_embed
setup, but make the mocked contextualized_embed response return one result whose
embeddings contains multiple vectors, then assert the callable flattens
result.results[*].embeddings into the final output. This should replace the
current multi-result concatenation check while still verifying embed is not
called and contextualized_embed is used.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 5a57c8b2-22c8-4709-bd45-3bc212afb30b

📥 Commits

Reviewing files that changed from the base of the PR and between 6491f5a and 042bd71.

📒 Files selected for processing (2)

lib/crewai/src/crewai/rag/embeddings/providers/voyageai/embedding_callable.py
lib/crewai/tests/rag/embeddings/test_voyageai_embedding_callable.py

- Wrap inputs as List[List[str]] (each string = single-chunk document) - Remove invalid enable_auto_chunking and chunk_size params - Use .results[i].embeddings[0] for single-chunk extraction - Update tests to match correct API contract - mypy: 0 errors, pytest: 5/5 VoyageAI + 15/15 factory tests pass

coderabbitai

🧹 Nitpick comments (1)

lib/crewai/tests/rag/embeddings/test_voyageai_embedding_callable.py (1)

51-107: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add a regression test for input_type on voyage-context*.

These tests cover wrapping and flattening, but they never assert what gets passed when the function is configured with input_type="query". That gap is why the current hardcoded "document" behavior can slip through.

🧪 Minimal test to add

+    def test_contextualized_model_forwards_input_type(self):
+        with patch("voyageai.Client") as mock_client_class:
+            mock_client = MagicMock()
+            mock_client_class.return_value = mock_client
+            mock_client.contextualized_embed.return_value = MagicMock(
+                results=[MagicMock(embeddings=[[0.1, 0.2]])]
+            )
+
+            fn = VoyageAIEmbeddingFunction(
+                api_key="voyage-key",
+                model="voyage-context-4",
+                input_type="query",
+            )
+            fn(["aa"])
+
+            _, kwargs = mock_client.contextualized_embed.call_args
+            assert kwargs["input_type"] == "query"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/tests/rag/embeddings/test_voyageai_embedding_callable.py` around
lines 51 - 107, Add a regression test around VoyageAIEmbeddingFunction for
voyage-context* models when configured with input_type="query"; the current
contextualized embedding tests only cover document-style wrapping and miss this
branch. Extend the existing test module by creating a case that instantiates
VoyageAIEmbeddingFunction with input_type="query", invokes it, and asserts the
mocked voyageai.Client.contextualized_embed call receives the query-specific
input_type rather than the hardcoded document value. Keep the assertions aligned
with the existing contextualized_embed call_args checks so the new test clearly
guards the input_type wiring.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@lib/crewai/tests/rag/embeddings/test_voyageai_embedding_callable.py`:
- Around line 51-107: Add a regression test around VoyageAIEmbeddingFunction for
voyage-context* models when configured with input_type="query"; the current
contextualized embedding tests only cover document-style wrapping and miss this
branch. Extend the existing test module by creating a case that instantiates
VoyageAIEmbeddingFunction with input_type="query", invokes it, and asserts the
mocked voyageai.Client.contextualized_embed call receives the query-specific
input_type rather than the hardcoded document value. Keep the assertions aligned
with the existing contextualized_embed call_args checks so the new test clearly
guards the input_type wiring.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: c8757ef0-82ae-4557-aa08-97e2d74e53b2

📥 Commits

Reviewing files that changed from the base of the PR and between 042bd71 and 2971943.

📒 Files selected for processing (2)

lib/crewai/src/crewai/rag/embeddings/providers/voyageai/embedding_callable.py
lib/crewai/tests/rag/embeddings/test_voyageai_embedding_callable.py

…ional dep The VoyageAI callable tests patched `voyageai.Client` directly, which raises ModuleNotFoundError in CI where `voyageai` is an optional, uninstalled dependency. Inject a mock `voyageai` module into `sys.modules` instead so the lazy `import voyageai` inside `VoyageAIEmbeddingFunction.__init__` resolves regardless of whether the package is present.

coderabbitai Bot reviewed Jun 27, 2026

View reviewed changes

Comment thread lib/crewai/src/crewai/rag/embeddings/providers/voyageai/embedding_callable.py Outdated

fzowl force-pushed the feat/embedding-model-voyage-context-4 branch from 99e5301 to 2971943 Compare June 29, 2026 13:53

coderabbitai Bot reviewed Jun 29, 2026

View reviewed changes

fzowl and others added 3 commits June 29, 2026 16:02

Merge branch 'main' into feat/embedding-model-voyage-context-4

91bce5a

Merge branch 'main' into feat/embedding-model-voyage-context-4

fcaa973

fzowl force-pushed the feat/embedding-model-voyage-context-4 branch from 952a11c to d595c9e Compare June 30, 2026 12:50

Merge branch 'main' into feat/embedding-model-voyage-context-4

2ec3846

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(rag): add VoyageAI voyage-context-4 contextualized embedding support#6368

feat(rag): add VoyageAI voyage-context-4 contextualized embedding support#6368
fzowl wants to merge 6 commits into
crewAIInc:mainfrom
fzowl:feat/embedding-model-voyage-context-4

fzowl commented Jun 27, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 27, 2026 •

edited

Loading

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

fzowl commented Jun 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fzowl commented Jun 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 27, 2026 •

edited

Loading